-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ring: add GetWithOptions method to adjust per call behavior #632
base: main
Are you sure you want to change the base?
Conversation
@@ -1332,36 +1420,3 @@ func (op Operation) ShouldExtendReplicaSetOnState(s InstanceState) bool { | |||
|
|||
// All states are healthy, no states extend replica set. | |||
var allStatesRingOperation = Operation(0x0000ffff) | |||
|
|||
// numberOfKeysOwnedByInstance returns how many of the supplied keys are owned by given instance. | |||
func (r *Ring) numberOfKeysOwnedByInstance(keys []uint32, op Operation, instanceID string, bufDescs []InstanceDesc, bufHosts []string, bufZones []string) (int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was only called from tests. I've moved most of this logic to the test in question (token_range_test.go
).
35571f9
to
e17f2cc
Compare
@@ -627,27 +628,31 @@ func TestRing_Get_ZoneAwareness(t *testing.T) { | |||
"should succeed if there are enough instances per zone on RF = 3": { | |||
numInstances: 16, | |||
numZones: 3, | |||
totalZones: 3, | |||
replicationFactor: 3, | |||
zoneAwarenessEnabled: true, | |||
expectedInstances: 3, | |||
}, | |||
"should fail if there are instances in 1 zone only on RF = 3": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the change to add an instance in each zone below, this test would fail.
Previously, findInstancesForKey
ignored the actual number of zones, len(r.ringTokensByZone)
, and assumed that replication factor was equivalent. Now that we've changed things so that replication factor is not necessarily the same as the number of zones, we need to add at least one instance in each zone so that findInstancesForKey
is aware of them and only picks a single instance from each zone.
distinctHosts = bufHosts[:0] | ||
distinctZones = bufZones[:0] | ||
|
||
// TODO: Do we need to pass this in to avoid allocations? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to replace passing bufDescs, bufHosts, bufZones
around everywhere with a single struct Buffers
which could include this map but that was a pretty invasive and non-backwards compatible change (not that we have any guarantees in dskit).
This change adds a new method that accepts 0 or more `Option` instances that modify the behavior of the call. These options can (currently) be used to adjust the replication factor for a particular key or use buffers to avoid excessive allocation. The most notable changes are in the `Ring.findInstancesForKey` method which is the core of the `Ring.Get` method. Instead of keeping track of distinct zones and assuming that only a single instance per zone would ever be returned, we keep a map of the number of instances found in each zone. Part of grafana/mimir#9944
Signed-off-by: Nick Pillitteri <[email protected]>
6c1dec6
to
7b887a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only reviewed the prod changes so far
|
||
// SupportsExpandedReplication returns true for replication strategies that | ||
// support increasing the replication factor beyond a single instance per zone, | ||
// false otherwise. | ||
SupportsExpandedReplication() bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we know if this won't break clients with their own ReplicationStrategy
implementations? I'm wondering if we shouldn't add a new interface and check with type assertions in getReplicationSetForKey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will break existing ReplicationStrategy
implementations but it's an obvious and simple fix. We could add a new type but I'm not sure what that gets us besides not requiring changes from people that have their own ReplicationStrategy
implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's the only thing I can think of. A separate method in the existing interface is also fine as long as the list of optional features doesn't keep growing. (feel free to go either way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you mean. I'll leave it as-is for now and we can revisit if we end up having more than just this feature we need to test against.
distinctZones = bufZones[:0] | ||
|
||
// TODO: Do we need to pass this in to avoid allocations? | ||
hostsPerZone = make(map[string]int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. Can't we still use a slice with a struct? Have you ran BenchmarkRing_Get
(maybe also adding a test case there with a different RF)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we don't need to pass this in to avoid an allocation 🤷 . In the new version, Get
ends up being ~7% slower than the old version
$ benchstat old.txt new.txt
goos: linux
goarch: amd64
pkg: github.com/grafana/dskit/ring
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
Ring_Get/with_zone_awareness-16 1.145µ ± ∞ ¹ 1.233µ ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/one_excluded_zone-16 835.5n ± ∞ ¹ 943.5n ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/without_zone_awareness-16 881.8n ± ∞ ¹ 934.8n ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/without_zone_awareness,_not_enough_instances-16 647.6n ± ∞ ¹ 676.1n ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean 859.7n 926.0n +7.71%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
│ old.txt │ new.txt │
│ B/op │ B/op vs base │
Ring_Get/with_zone_awareness-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/one_excluded_zone-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/without_zone_awareness-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/without_zone_awareness,_not_enough_instances-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean ³ +0.00% ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean
│ old.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
Ring_Get/with_zone_awareness-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/one_excluded_zone-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/without_zone_awareness-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
Ring_Get/without_zone_awareness,_not_enough_instances-16 0.000 ± ∞ ¹ 0.000 ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean ³ +0.00% ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean
What this PR does:
This change adds a new method that accepts 0 or more
Option
instances that modify the behavior of the call. These options can (currently) be used to adjust the replication factor for a particular key or use buffers to avoid excessive allocation.The most notable changes are in the
Ring.findInstancesForKey
method which is the core of theRing.Get
method. Instead of keeping track of distinct zones and assuming that only a single instance per zone would ever be returned, we keep a map of the number of instances found in each zone.Which issue(s) this PR fixes:
Part of grafana/mimir#9944
Previous attempt #620
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]